static website

THOK.ORG itself

The biggest and last website to move over to new hardware was THOK.ORG itself. Bits of this website go back decades, to a slightly overclocked 486DX/25 on a DSL line - while static websites have some significant modern advantages, the classic roots are in "not actually having real hardware to run one". That said, it does have a lot of sentimental value, and a lot of personal memory - mainly personal project notes, for things like "palm pilot apps" or "what even is this new blogging thing" - so I do care about keeping it running, but at the same time am a little nervous about it.

(Spoiler warning: as of this posting, the conversion is complete and mostly uneventful, and I've made updates to the new site - this is just notes on some of the conversion process.)

Why is a static site complicated?

"static site" can mean a lot of things, but the basic one is that the web server itself only delivers files over http/https and doesn't do anything dynamic to actually deliver the content.¹ This has security benefits (you don't have privilege boundaries if there are no privileges) and run-time complexity benefits (for one example, you're only using the most well-tested paths through the server code) but it also has testing and reliability benefits - if you haven't changed anything in the content, you can reasonably expect that the server isn't going to do anything different with it, so if it worked before, it works now.

This also means that you will likely have a "build" step where you take the easiest-to-edit form and turn it into deliverable HTML. Great for testing - you can render locally, browse locally, and then push the result to the live site - but it does mean that you want some kind of local tooling, even if it's just the equivalent of find | xargs pandoc and a stylesheet.

For THOK.ORG, I cared very little about style and primarily wanted to put up words (and code snippets) - Markdown was the obvious choice, but it hadn't been invented yet! I was already in the habit of writing up project notes using a hotkey that dropped a username and datestamp marker in a file, and then various "rich text" conventions from 1990's email (nothing more than italic, bold, and code) - I wasn't even thinking of them as markup, just as conventions that people recognized in email without further rendering. So while the earliest versions of the site were just HTML, later ones were a little code to take "project log" files and expand them into blog-like entries. All very local, README → README.html and that was it.

Eventually I wrote a converter that turned the project logs into "proper" markdown - not a perfect one (while using a renderer helped bring my conventions in line with what rendered ok, I never managed to really formalize it and some stuff was just poorly rendered), just one that was good enough that I could clean up the markdown by hand and go all in on it. There was a "side trip" of using Tumblr as a convenient mobile blogging service - phone browsers were just good enough that I could write articles in markdown on a phone with a folding bluetooth keyboard at pycon.ca and get stuff online directly - I didn't actually stick with this and eventually converted them back do local markdown blogs (and then still didn't update them.)

Finally (2014 or so) I came up with a common unifying tool to drag bits of content together and do all of the processing for the content I'd produced over the years. thoksync included a dependency declaration system that allowed parallelized processing, and various performance hacks that have been overtaken by Moore's Law in the last decade. The main thing is that it was fast enough to run in a git post-update hook so when I pushed changes to markdown files, they'd get directly turned into live site updates. Since I was focussed on other things in the meantime (including a new startup in 2015) and the code worked I hadn't really touched it in the last decade... so it was still python 2 code.

Python 2 to Python 3 conversion

Having done a big chunk of work (including a lot of review, guidance, and debugging) on a python 3 conversion of a commercial code base, I was both familiar with the process and had not expected to ever need to touch it again - the product conversion itself was far later than was in any way reasonable, and most other companies would have been forced to convert sooner. It was a bit of a surprise to discover another 2000+ lines of python 2 code that was My Problem!

While there were only a few small CLI-tool tests in the code (which I was nonetheless glad to have) I did have the advantage of a "perfect" test suite - the entire thok.org site. All I had to do was make sure that the rendering from the python 3 code matched the output from the python 2 code - 80,000 lines of HTML that should be the same should be easy to review, right?

This theory worked out reasonably well at first - any time the partially converted code crashed, well, that was obviously something that needed fixing.

Port the entire thok build from python2 to python3

(tested by diffing the built site with leap - found some corrupted images/large binaries)
* #! update
* print → print(), file= (in comments too)
* Popen(text=True)
* except/as
* drop (object)
* lost feedbalidator so minivalidate.py doesn't actually work yet
* file → open
* open → with open as
* rfc822 → email.utils (parsedate, formatdate)
* argument "tuple unpacking" is gone
* SimpleHTTPServer, BaseHTTPServer → http.server
* isinstance(basestring) → isinstance(str) (just to reverse-detect etree fragments)
* markdown.inlinepatterns.Pattern → InlineProcessor (old API exists but it made more sense to debug the new one)
   * etree no longer in markdown.util
   * grouping no longer mangled, so group(1) is correct
   * different return interface
   * add → register
* string hack for WikiLinkExtension arguments no longer works
* lxml.xml.tostring → encoding="unicode" in a few places to json-serialize sanely
   * in a few places, keep it bytes but open("w" → "wb") instead
* thokrss: dependency tracking → tracker (was *never* right, just untested __main__ code)
* python 2 allowed sorting functions by id; python 3 doesn't, so just extract the names in key=
* tumblr2thoksync: long → int
* transformer.py: remove a bunch of unused imports

... get to the python rendering code ... point to staticsite ... mention nagaina

This definition of static doesn't preclude things with client-side javascript - I've seen one form of static site where the server delivered markdown files directly to the client and the javascript rendered them there, which is almost clever but requires some visible mess in the files, so I've never been that tempted; it would also mean implementing my own markdown extensions in javascript instead of python, and... no. ↩

2025-02-07

Topics: static website

Staticsite itself

Got far enough into staticsite that it was time to go beyond the basic blog, and the ice cream blog turns out to be a good testbed for that.

Fix the images

Images (specifically, jpg files from cameras or modern cellphones) are, by default, large and messy, despite staticsite doing clever things with img.srcset. It turns out that there's a stack of problems:

ImageMagick convert doesn't update (or discard) EXIF.width and EXIF.height when resizing, and later parts of the toolchain (probably including the browser itself) get mislead by small images with large dimensions.
Certain parts of the staticsite markdown processing path end up giving absolute instead of relative links to the produced images (still looking for where though) and so if you make a local sandbox copy of the main site, some of the img files that the browser fetches actually come from the upstream live site instead of the sandbox, completely confusing your debugging process.
I really want the images to use bootstrap's img-fluid which I can add using the markdown "attributes" extension, which is already turned on, but I want it consistently site wide.

On top of that, it may turn out that the part of the problem I care about needs to be fixed in the python-markdown layer instead of staticsite itself, but it may just be "non-overridable python code"¹ rather than something I even can fix in a theme.

Current solutions:

github ticket #70 filed to describe the <img> problem and hostname part.
Use the python-markdown attribute extension {: class="img-fluid"} manually on all images, so that they scale-to-fit regardless of what processing they've been through.
Wrote a little icecream-start shop-name that takes kpa-grep output and fills in a blank markdown file with a title and filled in ![]() image includes for each image (so I can write the article and just delete the unneeded images as I go along - which will work better once #70 is fixed, for now half of the images go upstream instead of locally.)
Bigger hammer: icecream-start now uses jhead -autorot -purejpg² which just rotates them losslessly and wipes out any conflicting EXIF metadata. This, combined with img-fluid and a width-clamp in site.css were the minimal "image-heavy pages are actually good now" set of changes.

Finish taxonomy support

staticsite has Hugo-style taxonomies (to the point of linking to them for documentation.) It does a fine job building index pages, but stops there. The two followons to make them useful are

Link those index pages in the navbar (or the sidebar, but for photo-heavy mobile use I find that the sidebar is an utter failure, so my first template effort was to turn that off and use full width ("12 column" in bootstrap terms)
The default page templates include the tags at the bottom, but only if they're from the tags taxonomy. Turns out we can just iterate over the available taxonomies and render all of them.

Current solutions:

navbar config is one line in the index.md metadata, done.
replacing the "tags for this article" with "all tags for all taxonomies for this article" was some simple nested loops in Jinja2, once I got past the scoping problem below.

A future possibility is to add some markup (possibly subverting the wikilinks syntax, or maybe just using links with a magic urltype) that lets me just use the tags in-line in the text without having to put them in the per-post metadata. (Future, not blocking for now, and ideally it would just be a hook into the same taxonomy plumbing.)

The template changes ran into some issues:

Jinja2 macros are file scoped, so an attempt to replace a single macro (like inline_page as called by pages) is silently ignored, instead you need to replace the entire file including the otherwise unchanged calling macro (at which point you might consider giving up on extending the existing theme in the first place.)
Some of the ssite subcommands will parse a .staticsite.py or settings.py in the top level of the site source, which would let you configure a theme; important ones like ssite show ignore that entirely and require a --theme argument.
For a while this looked like "syntactically bad themes (or settings) were silently not imported"; that turns out not to be true, it just wasn't importing them at all because the config was ignored instead.
The existing settings aren't actually in-scope in the settings file, though you may be able to import the global settings it's not clear that those are the correct ones after other processing.
Some of the data structures visible in the template act like strings but aren't strings - so for example, you can iterate over the taxonomies, and if you render that inline you get the names, but you can't then get the taxonomy from there because you end up attempting to use the object as a key and not the name. On top of that, python code in jinja2 templates has very limited access to python builtins - so you don't have dir or str (though you can simulate the latter with "" ~ var, it's not great.) Turns out that most of these objects have a .name you can use directly, but I haven't found good documentation for that - but at this point, I recognize it as a pattern, so "just try .name" is part of my experimentation repertoire.

System dark mode

blag had what turns out to be really simple bits of CSS³ for a dark mode that turns on when the browser is in dark mode (usually triggered by "system" darkmode, through xsettings and GTK themes.) It's worth adding that to the staticsite theme if we can do it in a simple way.

Current solutions:

Within the theme directory, static/css/*.css get installed, so just copy the default site.css there and add extra files that it explicitly @include's.
Specifically, @import "bootstrap-color-fix.css" screen and (prefers-color-scheme: dark); isolates all of the horror - so providing a color mode is only one mechanical line of CSS.
To create that file, just copy /usr/share/javascript/bootstrap4/css/bootstrap.css (include attribution comments, it is MIT licensed) and delete everything that isn't a color, which gets it down to about 700 entries; then cook up a little elisp to "invert" a color string in the buffer. Yes, this is gruesomely brute force - but it's short term: bootstrap 5.3 has proper dark-mode support built in, so when staticsite upgrades (not something I'm prepared to tackle myself right now¹) we can just discard these changes and use that support instead. (I don't actually want any in-page controls for this, just automatic support for the viewer's system or in-browser choices.)

System dark mode followup (pre/code)

(2025-08-07 update) Turns out that staticsite's default-base base.html also brings in github.css for better coloring of code blocks, and it does it after site.css is included in the page, so it stomps on definitions in bootstrap-color-fix.css - or it would if it had any; https://thok.site is currently my only site with significant code blocks and I hadn't noticed the problem until a friend (possibly the only reader) pointed out that the wireguard article was a visual mess.

The straightforward fix is to just add a .localtheme/static/css/github.css with conditional imports for both light and dark versions of github.css, also included in that directory. (The dark version was just "take the light version and change each color byte to 255-x - it is entirely possible that doing this in HSV space is better, but the Right Way already known: wait until staticsite upgrades to bootstrap 5.3 or later, and then rip all of this out!

More markdown extensions

It's a little messy to even turn on extensions; the documentation (doc/reference/pages/markdown.md) says you can set MARKDOWN_EXTENSIONS but it doesn't actually say where and see the problem above about things ignoring settings.py.

Aside from wikilinks for in-line taxonomy reference, I'd like to turn on whatever makes bare URLs into links; SO suggests just using <> which I'd forgotten, but also gives both a (mildly flawed) sample extension for it and a pointer to markdown2 which has link-patterns as a mechanism for this.

Geography

Saw Simon Willison's experiments with OpenFreeMap and MapLibre and realized it would be really easy to lay out my Ice Cream Journey on it. Not sure it's worth actually hosting an entire tileset (when by definition I only need Massachusetts), and later on I might just stash maps at various static zoom levels or something simple like that. For now, though, it's responsive and doesn't need an API key, and the Javascript interface is straightforward.

In fact, my use of the interface is probably too straightforward - rather than being generated from page metadata, there's just a hard-coded list of Names, markdown page names, and lat/long pairs, and two dozen lines of code to forEach the place list and create a maplibregl.Marker attached to a maplibregl.Popup for each; through the glory of Unicode, we can even have 🍨 markers for general ice cream and 🍦 for places that specialize in soft-serve. That all works fine, the only manual step is adding a single line of data to the map.html file for every review I do - technically moving it into per-page metadata wouldn't be less work, or more robust in any way, but it feels like the right place for it, so I'll get to that eventually.

Since this is still an experiment, I didn't want to just have "Map" in the navbar, I wanted a specific experimental marker in the title. The definition of the navbar is just a list in the metadata of index.md itself, but the titles are expected to be in the metadata of each of those pages - the main trick here is that raw html files aren't, they're actually J2Page Jinja2 templates, so you can stuff a {% block front_matter %} inside an HTML comment, and that works as a clean way to hide the metadata.⁴

Page Width

One final issue (and one of the only design aspects I've gotten feedback about from readers⁵) is that on a wide screen, the pictures are too huge and the text ends up ridiculously wide. It took decades but the web design industry did realize that the newspaper industry's use of narrow columns was good for reading,⁶ but Bootstrap itself doesn't appear to have any useful defaults for this (or even any good stackoverflow answers.) All it needs is

@media (min-width: 40em) {
    .container-fluid {
        width: 40em;
    }
}

(adjust 40em to taste, but probably keep it in character-width units to stay consistent with other user preference choices.) All this declares is that if the screen is 40em wide or larger, set the outermost bootstrap container width to 40em; this keeps smaller size layouts unchanged, and breaks smoothly as you get larger.

It's open source python code, everything is overridable, but for me it's a big step towards just writing a new engine (or adding these features to one of my old ones) which I'm specifically shying away from in this moment. ↩↩
github:Matthias-Wandel/jhead, yes, that Matthias Wandel of youtube woodworking fame. ↩
See blag style.css for the prefers-color-scheme conditionals in @media stanzas; a mere 8 lines for each scheme. ↩
This trick doesn't appear to work for generated references, so while I can add archive to the nav list, it gets the site title instead... currently worked around with a querySelector.textContent assignment in a DOMContentLoaded function in the blog.html and page.html templates, but ironically that doesn't fix the archive page itself. ↩
Both of them! Dark mode, on the other hand, was entirely implemented for me personally, and worth the effort to get working when I was still looking at the site in draft, regardless of anyone else ever seeing it. ↩
Even though it had very little to do with that and was more of an artifact of how to assemble type in frames for printing, up through linotype and phototypesetting in column inches that were literally pasted up. ↩

2024-10-13

Topics: bloggery static website

Static website for blogs and more

My earlier attempts to distill blogging (and blog creation) down from a software and sysadmin task to "just name something and start writing" have kind of failed, but as I'm shuffling around hardware and feeling inspired to procrastinate by writing, I'm doing another pass.

Given that I'm python-oriented, I wanted something primarily in python, open source, with extra points for "maintained in Debian" and "I haven't failed to use it previously."

Blag

blag is maintained by a Debian developer, easy to get launched, is named after an XKCD comic, and I actually put 3 draft blogs together with it in a couple of days before trying the next thing. (In particular, I had one site that was going to mostly be collected essays and with some blog bits, and not primarily a blog, though I still wanted an index and RSS and tagging, I had some trouble reorganizing that one into the right shape.)

Definitely still worth a look, especially for anything "actually blog shaped" - I had filled half a whiteboard with notes on what I actually wanted before I stumbled on the next candidate, so it was very helpful in getting me to define what I meant by "static site blogging" and how that was different from what I thought I meant. Unlike many of the other systems discussed here, the developer actually notices github issues, which is commendable.

Staticsite

staticsite caught my eye in an odd sort of way - it's still a markdown blog with other features, an instant-blog tutorial (doc/tutorial/blog.md), and some obvious tooling. What stood out was that it had Hugo-inspired taxonomy support - when tags aren't enough but you want kinds of tags, this lets you name and label a group, and have automatic lists of pages in the navbar, just by using them (and creating one two-line file.) This was attractive, especially for my ice cream blog which is itself completely serious but also serves as a playground for tooling and rendering ideas; ice cream shops have flavors, towns, and novelties and I can just drop a little metadata on each page.

(2024-08-07 side note: still fixing some details like actually including those on the pages themselves like tags are¹, doing user defined themes² at all, and fixing the image handling³; I'm not stuck on any of those, just merely-part-way into them.)

(2024-08-21 side note: fixed the above and I'm using it live - see staticsite-itself for more in-depth usage and customization.)

Others

Others I've glanced at - didn't really dismiss, they just didn't end up on the fast-path before I got to staticsite:

Pelican

pelican is in Debian, and the initial description starts with metadata in a post; this wasn't originally an objectionable issue, but after using blag and staticsite I find I really want a minimal post to need no more than a # title (though I certainly want to be able to add metadata later, that's "being organizational", not "blogging", and is minor unexpected friction.) Is this excessive? Certainly, but I'm also someone that recommends that developers learn to touch-type (and pick an editor) early in their careers - I'm already committed to being excessive about flow and friction.

Nikola

Nikola python, markdown, MathJax; also heavy on the required metadata (and seems to require a new_post command. ssite new is similar but optional, and is really just a generic "run a template for me" tool.) Looks very featureful, I was just in the mood for something with less rope.

Hyde

Hyde is named as a pun on Jekyll (a popular github-pages-capable ruby static site tool) - not in Debian, is on pypi but last release was 9 years ago, the description page has many dead links, and doesn't yet have a completed python3 port.

other sources

https://wiki.python.org/moin/StaticSiteGenerator (I'd forgotten for a moment that moinmoin itself is not static, I used to use it for a homedir-only wiki though.) staticjinja is on this list and is even more minimal/"raw templating" than staticsite; not so much as a recommendation but a point on the curve describing the shape of these things.
https://www.reddit.com/r/Python/comments/rja4l2/what_is_the_best_python_static_site_generator/ turns up high in google but the only relevant bits are Sphinx, Lektor (newish) and mkdocs-material.
https://jamstack.org/generators/ looks exhaustive, to the point of including a number of long-dead examples among the currently 54 listed (and staticsite itself isn't on the python list.) The backing repo has a pile of untouched pull requests, so it's likely to stay out of date.

the main blog template renders tags in-line but doesn't automatically notice taxonomies (or better yet, taxonomies mentioned in the nav bar.) ↩
turns out that ssite show ignores .staticsite.py so you can't set an explicit path to a theme, but it takes a --theme argument; misleadingly, ssite shell does read the settings. There are probably 2 or 3 issues here, I'm just not sure which ones are real (the "show ignores settings" bit might just be an under-documented security concern) and haven't filed them yet. ↩
recently figured out that ImageMagic convert -resize produces a smaller JPEG, but doesn't update the EXIF Data which definitely misleads the browser, and is probably also misleading ssite when it generates the smaller images (since it also doesn't discard the EXIF data.) Again, still needs a couple of experiments where I do clean up and let it re-run before deciding which parts are actually issues. (In the end, I stomped on the native size-handling with bootstrap's img-fluid.) ↩

2024-10-13

Topics: bloggery static website